160 research outputs found

    The Importance of Clipping in Neurocontrol by Direct Gradient Descent on the Cost-to-Go Function and in Adaptive Dynamic Programming

    Full text link
    In adaptive dynamic programming, neurocontrol and reinforcement learning, the objective is for an agent to learn to choose actions so as to minimise a total cost function. In this paper we show that when discretized time is used to model the motion of the agent, it can be very important to do "clipping" on the motion of the agent in the final time step of the trajectory. By clipping we mean that the final time step of the trajectory is to be truncated such that the agent stops exactly at the first terminal state reached, and no distance further. We demonstrate that when clipping is omitted, learning performance can fail to reach the optimum; and when clipping is done properly, learning performance can improve significantly. The clipping problem we describe affects algorithms which use explicit derivatives of the model functions of the environment to calculate a learning gradient. These include Backpropagation Through Time for Control, and methods based on Dual Heuristic Dynamic Programming. However the clipping problem does not significantly affect methods based on Heuristic Dynamic Programming, Temporal Differences or Policy Gradient Learning algorithms. Similarly, the clipping problem does not affect fixed-length finite-horizon problems

    The Divergence of Reinforcement Learning Algorithms with Value-Iteration and Function Approximation

    Get PDF
    This paper gives specific divergence examples of value-iteration for several major Reinforcement Learning and Adaptive Dynamic Programming algorithms, when using a function approximator for the value function. These divergence examples differ from previous divergence examples in the literature, in that they are applicable for a greedy policy, i.e. in a "value iteration" scenario. Perhaps surprisingly, with a greedy policy, it is also possible to get divergence for the algorithms TD(1) and Sarsa(1). In addition to these divergences, we also achieve divergence for the Adaptive Dynamic Programming algorithms HDP, DHP and GDHP.Comment: 8 pages, 4 figures. In Proceedings of the IEEE International Joint Conference on Neural Networks, June 2012, Brisbane (IEEE IJCNN 2012), pp. 3070--307

    Baseline win rates for neural-network based trading algorithms

    Get PDF
    Neural networks and other machine-learning systems are used to create automatic financial forecasting and trading systems. To aid comparison of such systems, there is a need for reliable performance metrics. One such metric that may be considered is the win rate. We show how in certain circumstances the win-rate statistic can be very misleading, and to counter this, we propose and define baseline win rates for comparison. We develop empirical and closed-form models for such baselines and validate them against financial data and a neural forecaster

    Practical Game Design Tool: State Explorer

    Get PDF
    This paper introduces a computer-game design tool which enables game designers to explore and develop game mechanics for arbitrary game systems. The tool is implemented as a plugin for the Godot game engine. It allows the designer to view an abstraction of a game’s states while in active development and to quickly view and explore which states are navigable from which other states. This information is used to rapidly explore, validate and improve the design of the game. The tool is most practical for game systems which are computer-explorable within roughly 2000 states. The tool is demonstrated by presenting how it was used to create a small, yet complete, commercial game

    Systems, Methods and Devices for Vector Control of Permanent Magnet Synchronous Machines using Artificial Neural Networks

    Get PDF
    An example method for controlling an AC electrical machine can include providing a PWM converter operably connected between an electrical power source and the AC electrical machine and providing a neural network vector control system operably connected to the PWM converter. The control system can include a current-loop neural network configured to receive a plurality of inputs. The current-loop neural network can be configured to optimize the compensating dq-control voltage. The inputs can be d- and q-axis currents, d- and q-axis error signals, predicted d- and q-axis current signals, and a feedback compensating dq-control voltage. The d- and q-axis error signals can be a difference between the d- and q-axis currents and reference d- and q-axis currents, respectively. The method can further include outputting a compensating dq-control voltage from the current-loop neural network and controlling the PWM converter using the compensating dq-control voltage

    Deep Learning in Target Space

    Get PDF
    Deep learning uses neural networks which are parameterised by their weights. The neural networks are usually trained by tuning the weights to directly minimise a given loss function. In this paper we propose to re-parameterise the weights into targets for the firing strengths of the individual nodes in the network. Given a set of targets, it is possible to calculate the weights which make the firing strengths best meet those targets. It is argued that using targets for training addresses the problem of exploding gradients, by a process which we call cascade untangling, and makes the loss-function surface smoother to traverse, and so leads to easier, faster training, and also potentially better generalisation, of the neural network. It also allows for easier learning of deeper and recurrent network structures. The necessary conversion of targets to weights comes at an extra computational expense, which is in many cases manageable. Learning in target space can be combined with existing neural-network optimisers, for extra gain. Experimental results show the speed of using target space, and examples of improved generalisation, for fully-connected networks and convolutional networks, and the ability to recall and process long time sequences and perform natural-language processing with recurrent networks

    Simple and fast calculation of the second-order gradients for globalized dual heuristic dynamic programming in neural networks.

    Get PDF
    We derive an algorithm to exactly calculate the mixed second-order derivatives of a neural network's output with respect to its input vector and weight vector. This is necessary for the adaptive dynamic programming (ADP) algorithms globalized dual heuristic programming (GDHP) and value-gradient learning. The algorithm calculates the inner product of this second-order matrix with a given fixed vector in a time that is linear in the number of weights in the neural network. We use a "forward accumulation" of the derivative calculations which produces a much more elegant and easy-to-implement solution than has previously been published for this task. In doing so, the algorithm makes GDHP simple to implement and efficient, bridging the gap between the widely used DHP and GDHP ADP methods
    • …
    corecore